Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Managed Iceberg streaming writes #32451

Merged
merged 3 commits into from
Sep 19, 2024

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Sep 13, 2024

Apply some windowing and add a new triggering_frequency_seconds parameter to support streaming writes to Iceberg tables.

The triggering frequency controls how often we commit data and create new snapshots

@github-actions github-actions bot added the build label Sep 13, 2024
@ahmedabu98
Copy link
Contributor Author

assign set of reviewers

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java.
R: @Abacn for label build.
R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@@ -307,4 +314,38 @@ public void testWritePartitionedData() {
assertThat(
returnedRecords, containsInAnyOrder(INPUT_ROWS.stream().map(RECORD_FUNC::apply).toArray()));
}

@Test
public void testStreamingWrite() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a couple more tests where the user has set up their PCollection differently, like if it started out with accumulating mode, or if they set a weird trigger in the middle of their pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test with fixed windows and accumulating mode. Let me know if there's anything particular we should test for

@ahmedabu98
Copy link
Contributor Author

Thanks @kennknowles, this is ready for another look

@ahmedabu98 ahmedabu98 merged commit 75a4637 into apache:master Sep 19, 2024
26 checks passed
reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
* iceberg streaming writes

* cleanup

* adress comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants